Overview

Dataset statistics

Number of variables11
Number of observations935
Missing cells272
Missing cells (%)2.6%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory80.5 KiB
Average record size in memory88.1 B

Variable types

Numeric9
Categorical2

Alerts

IQ is highly overall correlated with educ and 1 other fieldsHigh correlation
educ is highly overall correlated with IQ and 1 other fieldsHigh correlation
meduc is highly overall correlated with feducHigh correlation
feduc is highly overall correlated with meducHigh correlation
exper is highly overall correlated with educ and 2 other fieldsHigh correlation
tenure is highly overall correlated with experHigh correlation
age is highly overall correlated with experHigh correlation
black is highly overall correlated with IQHigh correlation
meduc has 78 (8.3%) missing valuesMissing
feduc has 194 (20.7%) missing valuesMissing
tenure has 30 (3.2%) zerosZeros

Reproduction

Analysis started2023-01-25 22:04:57.845830
Analysis finished2023-01-25 22:05:14.720398
Duration16.87 seconds
Software versionpandas-profiling vv3.5.0
Download configurationconfig.json

Variables

wage
Real number (ℝ)

Distinct449
Distinct (%)48.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean95794.545
Minimum11500
Maximum307800
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2023-01-25T15:05:14.864214image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum11500
5-th percentile43790
Q166900
median90500
Q3116000
95-th percentile169550
Maximum307800
Range296300
Interquartile range (IQR)49100

Descriptive statistics

Standard deviation40436.082
Coefficient of variation (CV)0.42211257
Kurtosis2.7175818
Mean95794.545
Median Absolute Deviation (MAD)24900
Skewness1.2011868
Sum89567900
Variance1.6350767 × 109
MonotonicityNot monotonic
2023-01-25T15:05:15.040977image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
100000 31
 
3.3%
125000 17
 
1.8%
80000 15
 
1.6%
50000 13
 
1.4%
96200 13
 
1.4%
144200 12
 
1.3%
60000 11
 
1.2%
90000 11
 
1.2%
75000 11
 
1.2%
120000 10
 
1.1%
Other values (439) 791
84.6%
ValueCountFrequency (%)
11500 1
0.1%
20000 1
0.1%
23300 1
0.1%
26000 1
0.1%
26500 1
0.1%
28900 1
0.1%
30000 1
0.1%
31000 1
0.1%
31800 1
0.1%
32500 2
0.2%
ValueCountFrequency (%)
307800 2
0.2%
277100 1
 
0.1%
266800 1
 
0.1%
250000 2
0.2%
240400 2
0.2%
231000 1
 
0.1%
230800 1
 
0.1%
216200 3
0.3%
213700 4
0.4%
209900 1
 
0.1%

hours
Real number (ℝ)

Distinct37
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean43.929412
Minimum20
Maximum80
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2023-01-25T15:05:15.217054image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum20
5-th percentile38
Q140
median40
Q348
95-th percentile60
Maximum80
Range60
Interquartile range (IQR)8

Descriptive statistics

Standard deviation7.2242559
Coefficient of variation (CV)0.16445146
Kurtosis4.1866405
Mean43.929412
Median Absolute Deviation (MAD)0
Skewness1.5961752
Sum41074
Variance52.189873
MonotonicityNot monotonic
2023-01-25T15:05:15.348637image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=37)
ValueCountFrequency (%)
40 497
53.2%
45 97
 
10.4%
50 91
 
9.7%
55 41
 
4.4%
48 35
 
3.7%
60 32
 
3.4%
44 19
 
2.0%
38 15
 
1.6%
43 14
 
1.5%
35 11
 
1.2%
Other values (27) 83
 
8.9%
ValueCountFrequency (%)
20 1
 
0.1%
23 1
 
0.1%
24 1
 
0.1%
25 1
 
0.1%
27 2
 
0.2%
30 7
0.7%
32 5
0.5%
34 1
 
0.1%
35 11
1.2%
36 4
 
0.4%
ValueCountFrequency (%)
80 4
 
0.4%
75 2
 
0.2%
70 7
 
0.7%
65 7
 
0.7%
64 1
 
0.1%
61 1
 
0.1%
60 32
3.4%
59 1
 
0.1%
58 2
 
0.2%
56 3
 
0.3%

IQ
Real number (ℝ)

Distinct80
Distinct (%)8.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean101.28235
Minimum50
Maximum145
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2023-01-25T15:05:15.512971image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum50
5-th percentile74
Q192
median102
Q3112
95-th percentile124.3
Maximum145
Range95
Interquartile range (IQR)20

Descriptive statistics

Standard deviation15.052636
Coefficient of variation (CV)0.14862052
Kurtosis-0.016643599
Mean101.28235
Median Absolute Deviation (MAD)10
Skewness-0.34097187
Sum94699
Variance226.58186
MonotonicityNot monotonic
2023-01-25T15:05:15.669434image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
96 35
 
3.7%
104 35
 
3.7%
109 33
 
3.5%
98 30
 
3.2%
97 28
 
3.0%
110 28
 
3.0%
105 27
 
2.9%
106 26
 
2.8%
101 23
 
2.5%
108 22
 
2.4%
Other values (70) 648
69.3%
ValueCountFrequency (%)
50 1
0.1%
54 1
0.1%
55 1
0.1%
59 1
0.1%
60 1
0.1%
61 1
0.1%
62 2
0.2%
63 1
0.1%
64 2
0.2%
65 1
0.1%
ValueCountFrequency (%)
145 1
 
0.1%
137 1
 
0.1%
134 4
0.4%
132 5
0.5%
131 4
0.4%
130 3
0.3%
129 4
0.4%
128 4
0.4%
127 6
0.6%
126 4
0.4%

educ
Real number (ℝ)

Distinct10
Distinct (%)1.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.468449
Minimum9
Maximum18
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2023-01-25T15:05:15.801204image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum9
5-th percentile11
Q112
median12
Q316
95-th percentile18
Maximum18
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.1966539
Coefficient of variation (CV)0.16309627
Kurtosis-0.73486269
Mean13.468449
Median Absolute Deviation (MAD)1
Skewness0.5486765
Sum12593
Variance4.8252883
MonotonicityNot monotonic
2023-01-25T15:05:15.908021image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%)
12 393
42.0%
16 150
 
16.0%
13 85
 
9.1%
14 77
 
8.2%
18 57
 
6.1%
15 45
 
4.8%
11 43
 
4.6%
17 40
 
4.3%
10 35
 
3.7%
9 10
 
1.1%
ValueCountFrequency (%)
9 10
 
1.1%
10 35
 
3.7%
11 43
 
4.6%
12 393
42.0%
13 85
 
9.1%
14 77
 
8.2%
15 45
 
4.8%
16 150
 
16.0%
17 40
 
4.3%
18 57
 
6.1%
ValueCountFrequency (%)
18 57
 
6.1%
17 40
 
4.3%
16 150
 
16.0%
15 45
 
4.8%
14 77
 
8.2%
13 85
 
9.1%
12 393
42.0%
11 43
 
4.6%
10 35
 
3.7%
9 10
 
1.1%

exper
Real number (ℝ)

Distinct22
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.563636
Minimum1
Maximum23
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2023-01-25T15:05:16.029120image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile5
Q18
median11
Q315
95-th percentile19
Maximum23
Range22
Interquartile range (IQR)7

Descriptive statistics

Standard deviation4.3745864
Coefficient of variation (CV)0.37830543
Kurtosis-0.56379545
Mean11.563636
Median Absolute Deviation (MAD)3
Skewness0.077800885
Sum10812
Variance19.137006
MonotonicityNot monotonic
2023-01-25T15:05:16.143125image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
11 89
 
9.5%
9 82
 
8.8%
8 72
 
7.7%
10 72
 
7.7%
16 68
 
7.3%
12 65
 
7.0%
13 62
 
6.6%
15 60
 
6.4%
7 54
 
5.8%
14 54
 
5.8%
Other values (12) 257
27.5%
ValueCountFrequency (%)
1 12
 
1.3%
3 1
 
0.1%
4 29
 
3.1%
5 30
 
3.2%
6 48
5.1%
7 54
5.8%
8 72
7.7%
9 82
8.8%
10 72
7.7%
11 89
9.5%
ValueCountFrequency (%)
23 2
 
0.2%
22 3
 
0.3%
21 12
 
1.3%
20 14
 
1.5%
19 23
 
2.5%
18 30
3.2%
17 53
5.7%
16 68
7.3%
15 60
6.4%
14 54
5.8%

tenure
Real number (ℝ)

HIGH CORRELATION
ZEROS

Distinct23
Distinct (%)2.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.2342246
Minimum0
Maximum22
Zeros30
Zeros (%)3.2%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2023-01-25T15:05:16.265527image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q13
median7
Q311
95-th percentile16
Maximum22
Range22
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.0752058
Coefficient of variation (CV)0.70155491
Kurtosis-0.79859858
Mean7.2342246
Median Absolute Deviation (MAD)4
Skewness0.4325322
Sum6764
Variance25.757714
MonotonicityNot monotonic
2023-01-25T15:05:16.379588image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
1 104
11.1%
2 93
 
9.9%
3 72
 
7.7%
9 71
 
7.6%
5 68
 
7.3%
4 59
 
6.3%
10 58
 
6.2%
7 56
 
6.0%
12 53
 
5.7%
8 48
 
5.1%
Other values (13) 253
27.1%
ValueCountFrequency (%)
0 30
 
3.2%
1 104
11.1%
2 93
9.9%
3 72
7.7%
4 59
6.3%
5 68
7.3%
6 19
 
2.0%
7 56
6.0%
8 48
5.1%
9 71
7.6%
ValueCountFrequency (%)
22 1
 
0.1%
21 2
 
0.2%
20 4
 
0.4%
19 6
 
0.6%
18 14
 
1.5%
17 9
 
1.0%
16 22
2.4%
15 38
4.1%
14 28
3.0%
13 40
4.3%

age
Real number (ℝ)

Distinct11
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean33.080214
Minimum28
Maximum38
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2023-01-25T15:05:16.498281image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum28
5-th percentile29
Q130
median33
Q336
95-th percentile38
Maximum38
Range10
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.1078033
Coefficient of variation (CV)0.093947496
Kurtosis-1.257094
Mean33.080214
Median Absolute Deviation (MAD)3
Skewness0.11873587
Sum30930
Variance9.6584411
MonotonicityNot monotonic
2023-01-25T15:05:16.613013image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=11)
ValueCountFrequency (%)
30 120
12.8%
32 99
10.6%
38 99
10.6%
31 98
10.5%
36 95
10.2%
29 86
9.2%
37 82
8.8%
33 81
8.7%
34 69
7.4%
35 61
6.5%
ValueCountFrequency (%)
28 45
 
4.8%
29 86
9.2%
30 120
12.8%
31 98
10.5%
32 99
10.6%
33 81
8.7%
34 69
7.4%
35 61
6.5%
36 95
10.2%
37 82
8.8%
ValueCountFrequency (%)
38 99
10.6%
37 82
8.8%
36 95
10.2%
35 61
6.5%
34 69
7.4%
33 81
8.7%
32 99
10.6%
31 98
10.5%
30 120
12.8%
29 86
9.2%

married
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.4 KiB
1
835 
0
100 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters935
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row1
2nd row1
3rd row1
4th row1
5th row1

Common Values

ValueCountFrequency (%)
1 835
89.3%
0 100
 
10.7%

Length

2023-01-25T15:05:16.722008image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-01-25T15:05:16.861145image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
1 835
89.3%
0 100
 
10.7%

Most occurring characters

ValueCountFrequency (%)
1 835
89.3%
0 100
 
10.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 935
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
1 835
89.3%
0 100
 
10.7%

Most occurring scripts

ValueCountFrequency (%)
Common 935
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
1 835
89.3%
0 100
 
10.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 935
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1 835
89.3%
0 100
 
10.7%

black
Categorical

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size7.4 KiB
0
815 
1
120 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters935
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0

Common Values

ValueCountFrequency (%)
0 815
87.2%
1 120
 
12.8%

Length

2023-01-25T15:05:16.953689image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-01-25T15:05:17.067381image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
ValueCountFrequency (%)
0 815
87.2%
1 120
 
12.8%

Most occurring characters

ValueCountFrequency (%)
0 815
87.2%
1 120
 
12.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 935
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 815
87.2%
1 120
 
12.8%

Most occurring scripts

ValueCountFrequency (%)
Common 935
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
0 815
87.2%
1 120
 
12.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII 935
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 815
87.2%
1 120
 
12.8%

meduc
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct19
Distinct (%)2.2%
Missing78
Missing (%)8.3%
Infinite0
Infinite (%)0.0%
Mean10.682614
Minimum0
Maximum18
Zeros3
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2023-01-25T15:05:17.161227image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile6
Q18
median12
Q312
95-th percentile16
Maximum18
Range18
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.8497563
Coefficient of variation (CV)0.26676583
Kurtosis0.94405474
Mean10.682614
Median Absolute Deviation (MAD)1
Skewness-0.4977403
Sum9155
Variance8.1211109
MonotonicityNot monotonic
2023-01-25T15:05:17.286530image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=19)
ValueCountFrequency (%)
12 357
38.2%
8 129
 
13.8%
10 65
 
7.0%
11 56
 
6.0%
9 47
 
5.0%
16 42
 
4.5%
7 31
 
3.3%
6 30
 
3.2%
14 28
 
3.0%
13 21
 
2.2%
Other values (9) 51
 
5.5%
(Missing) 78
 
8.3%
ValueCountFrequency (%)
0 3
 
0.3%
1 1
 
0.1%
2 5
 
0.5%
3 9
 
1.0%
4 6
 
0.6%
5 8
 
0.9%
6 30
 
3.2%
7 31
 
3.3%
8 129
13.8%
9 47
 
5.0%
ValueCountFrequency (%)
18 5
 
0.5%
17 7
 
0.7%
16 42
 
4.5%
15 7
 
0.7%
14 28
 
3.0%
13 21
 
2.2%
12 357
38.2%
11 56
 
6.0%
10 65
 
7.0%
9 47
 
5.0%

feduc
Real number (ℝ)

HIGH CORRELATION
MISSING

Distinct18
Distinct (%)2.4%
Missing194
Missing (%)20.7%
Infinite0
Infinite (%)0.0%
Mean10.217274
Minimum0
Maximum18
Zeros1
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size7.4 KiB
2023-01-25T15:05:17.417769image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile5
Q18
median10
Q312
95-th percentile16
Maximum18
Range18
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.3006999
Coefficient of variation (CV)0.32305094
Kurtosis-0.028311983
Mean10.217274
Median Absolute Deviation (MAD)2
Skewness-0.043468976
Sum7571
Variance10.89462
MonotonicityNot monotonic
2023-01-25T15:05:17.519497image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Histogram with fixed size bins (bins=18)
ValueCountFrequency (%)
12 216
23.1%
8 122
13.0%
10 77
 
8.2%
6 41
 
4.4%
11 40
 
4.3%
9 39
 
4.2%
16 38
 
4.1%
7 37
 
4.0%
14 28
 
3.0%
5 22
 
2.4%
Other values (8) 81
 
8.7%
(Missing) 194
20.7%
ValueCountFrequency (%)
0 1
 
0.1%
2 8
 
0.9%
3 9
 
1.0%
4 17
 
1.8%
5 22
 
2.4%
6 41
 
4.4%
7 37
 
4.0%
8 122
13.0%
9 39
 
4.2%
10 77
8.2%
ValueCountFrequency (%)
18 16
 
1.7%
17 6
 
0.6%
16 38
 
4.1%
15 7
 
0.7%
14 28
 
3.0%
13 17
 
1.8%
12 216
23.1%
11 40
 
4.3%
10 77
 
8.2%
9 39
 
4.2%

Interactions

2023-01-25T15:05:12.815358image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:02.740438image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:04.034440image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:05.346096image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:06.552923image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:07.867188image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:09.059248image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:10.295819image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:11.548193image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:12.957018image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:02.894281image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:04.196254image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:05.487599image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:06.794190image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:08.003115image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:09.198718image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:10.436422image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:11.690377image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:13.096800image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:03.061704image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:04.344301image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:05.625000image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:06.931299image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:08.142377image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:09.336119image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:10.594080image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:11.832351image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:13.229630image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:03.199118image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:04.474080image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:05.739312image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:07.058917image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:08.267609image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:09.461768image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:10.706716image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:11.970728image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:13.360334image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:03.335201image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:04.608644image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:05.880100image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:07.175638image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:08.396917image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:09.582846image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:10.847568image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:12.106099image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:13.490569image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:03.473399image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:04.743406image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:06.006996image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:07.314014image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:08.524498image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:09.719180image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:10.984815image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:12.239703image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:13.614353image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:03.599360image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:04.864183image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:06.135694image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:07.440765image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:08.644131image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:09.843643image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:11.123522image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:12.373926image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:13.754577image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:03.751473image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:05.031067image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:06.254672image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:07.570199image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:08.779509image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:09.988172image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:11.267514image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:12.511984image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:13.894476image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:03.894229image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:05.202224image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:06.395447image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:07.709569image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:08.922139image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:10.156210image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:11.409613image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
2023-01-25T15:05:12.678156image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Correlations

2023-01-25T15:05:17.659220image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Auto

The auto setting is an interpretable pairwise column metric of the following mapping:
  • Variable_type-Variable_type : Method, Range
  • Categorical-Categorical : Cramer's V, [0,1]
  • Numerical-Categorical : Cramer's V, [0,1] (using a discretized numerical column)
  • Numerical-Numerical : Spearman's ρ, [-1,1]
The number of bins used in the discretization for the Numerical-Categorical column pair can be changed using config.correlations["auto"].n_bins. The number of bins affects the granularity of the association you wish to measure.

This configuration uses the recommended metric for each pair of columns.
2023-01-25T15:05:17.848012image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2023-01-25T15:05:18.171494image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2023-01-25T15:05:18.363737image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2023-01-25T15:05:18.542187image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.
2023-01-25T15:05:18.672187image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2023-01-25T15:05:14.215720image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-01-25T15:05:14.449392image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-01-25T15:05:14.650988image/svg+xmlMatplotlib v3.6.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

wagehoursIQeducexpertenureagemarriedblackmeducfeduc
076900.040931211231108.08.0
180800.050119181116371014.014.0
282500.04010814119331014.014.0
365000.0409612137321012.012.0
456200.040741114534106.011.0
5140000.0401161614235118.0NaN
660000.040911013030008.08.0
7108100.0401141881438108.0NaN
8115400.04511115131361014.05.0
9100000.04095121616361012.011.0
wagehoursIQeducexpertenureagemarriedblackmeducfeduc
92564500.045931211335107.08.0
92678800.0401001115632119.0NaN
92764400.04210112115331012.0NaN
92847700.045100129331107.07.0
92966400.0608216109341116.016.0
93052000.040791661301111.0NaN
931120200.0401021310331108.06.0
93253800.0457712121028117.0NaN
93387300.0441091212122810NaN11.0
934100000.0401071217183510NaNNaN